Skip to content

Conversation

valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Sep 26, 2025

This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes #93228

@valeriy42 valeriy42 added >test Issues or PRs that are addressing/adding tests :ml Machine learning Team:ML Meta label for the ML team labels Sep 26, 2025
@valeriy42 valeriy42 marked this pull request as ready for review September 26, 2025 13:52
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@valeriy42 valeriy42 added auto-backport Automatically create backport pull requests when merged v8.19.5 v9.1.5 v8.18.8 v9.0.8 labels Sep 26, 2025
@DonalEvans
Copy link
Contributor

DonalEvans commented Sep 26, 2025

Would changing the test so that field_2 is 10 times field_1 instead of 2 times also help with making it more consistent, since it would make the difference between using the correct value and an incorrect one easier to spot?

The way we determine the prediction error also seems like it could be improved, since using the absolute value of the difference between the predicted and actual value is very forgiving for small numbers and very strict for large numbers (being allowed to be off by 3 when the predicted value is 2 is a huge margin of error compared to being allowed to be off by 3 when the predicted value is 600).

Perhaps instead of the absolute difference between predicted and actual, the relative difference could be used, since that would allow the final assertion to be that the predictions are on average within some % of the actual value, rather than being within some absolute range. Something like this perhaps (I chose 1% arbitrarily, I don't know how lenient the test actually needs to be):

int expectedValue = 2 * featureValue;
predictionErrorPercentSum += Math.abs(predictionValue - expectedValue) / expectedValue;
...
double meanPredictionErrorPercent = predictionErrorPercentSum / sourceData.getHits().getHits().length;
// Assert that the predicted values are on average within 1% of the expected values
assertThat(meanPredictionErrorPercent, lessThanOrEqualTo(0.01));

@valeriy42
Copy link
Contributor Author

Would changing the test so that field_2 is 10 times field_1 instead of 2 times also help with making it more consistent, since it would make the difference between using the correct value and an incorrect one easier to spot?

The way we determine the prediction error also seems like it could be improved, since using the absolute value of the difference between the predicted and actual value is very forgiving for small numbers and very strict for large numbers (being allowed to be off by 3 when the predicted value is 2 is a huge margin of error compared to being allowed to be off by 3 when the predicted value is 600).

Perhaps instead of the absolute difference between predicted and actual, the relative difference could be used, since that would allow the final assertion to be that the predictions are on average within some % of the actual value, rather than being within some absolute range. Something like this perhaps (I chose 1% arbitrarily, I don't know how lenient the test actually needs to be):

int expectedValue = 2 * featureValue;
predictionErrorPercentSum += Math.abs(predictionValue - expectedValue) / expectedValue;
...
double meanPredictionErrorPercent = predictionErrorPercentSum / sourceData.getHits().getHits().length;
// Assert that the predicted values are on average within 1% of the expected values
assertThat(meanPredictionErrorPercent, lessThanOrEqualTo(0.01));

@DonalEvans , thank you for your comment. This is a useful technique, and we employ similar assertions elsewhere in the test, where we actually verify that the algorithm is learning and predicting correct values. The goal of this test is to test that the alias fields are used correctly. Without fixing the hyperparameters on this test, the prediction was (on very rare occasions) completely off, and hence it would fail whether the assertion threshold is expressed in terms of relative or absolute values.

@DonalEvans
Copy link
Contributor

@DonalEvans , thank you for your comment. This is a useful technique, and we employ similar assertions elsewhere in the test, where we actually verify that the algorithm is learning and predicting correct values. The goal of this test is to test that the alias fields are used correctly. Without fixing the hyperparameters on this test, the prediction was (on very rare occasions) completely off, and hence it would fail whether the assertion threshold is expressed in terms of relative or absolute values.

Thanks for the explanation!

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.1
8.18
9.0

valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Oct 1, 2025
…e the test (elastic#135541)

This PR fixes the flaky test muted in elastic#93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes elastic#93228
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Oct 1, 2025
…e the test (elastic#135541)

This PR fixes the flaky test muted in elastic#93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes elastic#93228
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Oct 1, 2025
…e the test (elastic#135541)

This PR fixes the flaky test muted in elastic#93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes elastic#93228
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Oct 1, 2025
…e the test (elastic#135541)

This PR fixes the flaky test muted in elastic#93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes elastic#93228
elasticsearchmachine pushed a commit that referenced this pull request Oct 1, 2025
…e the test (#135541) (#135769)

This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes #93228
elasticsearchmachine pushed a commit that referenced this pull request Oct 1, 2025
…e the test (#135541) (#135770)

This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes #93228
elasticsearchmachine pushed a commit that referenced this pull request Oct 1, 2025
…e the test (#135541) (#135768)

This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes #93228
elasticsearchmachine pushed a commit that referenced this pull request Oct 1, 2025
…e the test (#135541) (#135771)

This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes #93228
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :ml Machine learning Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests v8.18.8 v8.19.5 v9.0.8 v9.1.5 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] RegressionIT testAliasFields failing
3 participants